Force-clamp analysis techniques reveal stretched 
exponential unfolding kinetics in ubiquitin 



Herbert Lannon*, Eric Vanden-Eijnden + , and J. Brujic 1 * 

*Department of Physics and Center for Soft Matter Research, New 
York University, 4 Washington Place, New York, NY, 10003, USA 
+ Courant Institute of Mathematical Sciences, New York University, 

New York, NY, 10012 



Corresponding author. 608 Meyer Hall, 4 Washington Place, New York, NY, 
10003, USA, Tel: (212)998-3586, Email: jb2929@nyu.edu 



Abstract 



Force-clamp spectroscopy reveals the unfolding and disulfide bond rupture 
times of single protein molecules as a function of the stretching force, point 
mutations and solvent conditions. The statistics of these times reveal whether 
the protein domains are independent of one another, the mechanical hierar- 
chy in the polyprotein chain, and the functional form of the probability dis- 
tribution from which they originate. It is therefore important to use robust 
statistical tests to decipher the correct theoretical model underlying the pro- 
cess. Here we develop multiple techniques to compare the well-established 
experimental data set on ubiquitin with existing theoretical models as a case 
study. We show that robustness against filtering, agreement with a max- 
imum likelihood function that takes into account experimental artifacts, 
the Kuiper statistic test and alignment with synthetic data all identify the 
Weibull or stretched exponential distribution as the best fitting model. Our 
results are inconsistent with recently proposed models of Gaussian disorder 
in the energy landscape or noise in the applied force as explanations for the 
observed non-exponential kinetics. Since the physical model in the fit affects 
the characteristic unfolding time, these results have important implications 
on our understanding of the biological function of proteins. 

Key words: maximum likelihood; static disorder; rare events; Data fil- 
tering; noise analysis 



Force-clamp spectroscopy analysis tools 



2 



Introduction 

Force-clamp spectroscopy using the atomic force microscope (AFM) has 
proven to be a useful tool for following the unfolding trajectories of single 
polyprotein molecules (ll-y). Previous studies have investigated the effect 



of the applied force (|4|, I7H10I). length of the polyprotein chain ([ill . Il2l ) and 
order statistics (0) on the unfolding kinetics of mechanically stable proteins. 
The simplest free energy landscape model for mechanical unfolding is a 
two-state reaction process over a single transition state barrier, which is 



tilted by the work done on the molecule (1131 ). In such a reaction driven by 
simple diffusion, the probability distribution of the measured dwell times 
at a given force is exponential with a rate of decay that is governed by the 
barrier height. Moreover, the unfolding rate is exponentially dependent on 
the applied force. The majority of previous studies have interpreted their 
data using this two-state model to determine the height of the energy barrier 
and the distance to the transition state. 

Apart from the two-state fitting of the unfolding kinetics of ubiquitin (0) , 
more recent work has shown that a larger statistical pool of dwell times at 
a given force reveals important deviations from exponential kinetics and 
requires more sophisticated modeling. Surprisingly, these deviations have 
led to three alternative models with different physical interpretations for 
the unfolding of ubiquitin pulled under the same experimental conditions. 
The first physical interpretation considers unfolding via multiple pathways 
in a rough energy landscape, where the timescale of interconversion between 
the folded states is assumed to be slow compared to that of unfolding. This 
scenario predicts that the nonexponential dwell times at a force of HOpN 
are consistent with exponentially distributed free energy barriers (|8T) . By 



contrast, a more recent work assumes that the static disorder (|14l . Il5l ) has 
a Gaussian distribution of barriers and derives the corresponding function 
to fit the experimental dwell times over a range of constant forces (fiol ). 
Alternatively, assu ming that the Gaussian distribution comes from the noise 



in the applied force (jlll ) leads to the same form of the non-exponential fitting 
function for the dwell times if the noise correlation time is longer than that of 
unfolding. In addition to these physical interpretations, in (1 121 ) a log normal 
distribution is proposed to be the best heuristic fit to the dwell times of both 
monomeric and polyubiquitin data. 

A possible explanation for the apparent success of these four models 
in fitting the same data is that rigorous methods of analyzing and assess- 
ing force-clamp trajectories are lacking. For example, some studies average 
and normalize the measured end-to-end length trajectories as an estimate 
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of the cumulative unfolding probability, while others export the individual 
dwell times and bin them into probability density distributions before fit- 
ting. Moreover, since the polyprotein chains vary in length and detach from 
the cantilever at random times, not all events are necessarily observed in the 
experiment. In order to account for the undetected events different filtering 
methods are applied to the data, each with their own associated uncertain- 
ties. In this paper we quantitatively assess the errors in existing analysis 
protocols, develop new analysis methods that systematically take into ac- 
count biases introduced by experimental artifacts and evaluate the success of 
each model using not only graphical tests, but also rigorous statistical tests 
based on maximum likelihood estimation and Bayesian sampling. We show 
that tests of robustness against filtering the data provide an excellent indi- 
cation of the validity of the underlying model and we illustrate the results 
in both real and synthetic data sets. In order to use the full experimental 
data set and avoid filtering, we additionally derive a likelihood function that 
calculates the probability of observing a sequence of dwell times followed by 
the measured detachment time of the molecule. This method allows us to 
rank the proposed models in terms of their consistency with observing the 
data set using standard statistical tests, such as the Kuiper test (fl^ .[l~7l). Fi- 
nally, we show the agreement between filtering techniques and the use of the 
likelihood function and propose a self-consistent recipe for data assessment 
in future experiments. 

The importance of distinguishing between fitting functions is to deduce 
the correct physical picture for protein unfolding, which sets the mechan- 
ical response timescales in biology. Indeed, it is striking that the mean 
unfolding times for the four proposed distributions for ubiquitin span from 
seconds to hundreds of seconds at a given constant force, thus emphasizing 
the importance of determining the correct model. 



Materials and Methods 

Force-clamp spectroscopy measurements are taken using the same AFM, 
ubiquitin polyprotein construct and experimental method described in 
& iTHlil). In response to a constant stretching force, each of the protein 
domains in a polyprotein chain unfolds stochastically, leading to a stepwise 
elongation of the end-to-end length over time, as shown in the example in 
Fig. Q] A. Time zero is marked at the beginning of the first plateau in the 
end-to-end length after the constant stretching force of HOpN is applied. 
The resulting staircase of unfolding events yields a set of dwell times t±, t2, 
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which mark the rupture of the native state of each domain to the fully 
extended unfolded state. Only staircases with a minimum of 3 repeating 
steps are included in the analysis as the signature of the single polyprotein 
molecule. Plotting over 2000 unfolding times in the order in which they 
are collected leads to the scatter graph in Fig. [T] B. The logarithmic scale 
emphasizes the span over three orders of magnitude of the unfolding times, 
while the homogeneity of the data from experiments performed with distinct 
cantilevers and on different days gives validity to the force calibration and 
the stability of the protein, respectively. 

Results and Discussion 

Unbiasing the unfolding data from experimental artifacts 

In order to determine the probability F(t) of observing an unfolding event 
after a time t, a common way to analyze time series data is to plot the cumu- 
lative distribution function (CDF) of the dwell times. Experimentally this 
CDF is often constructed by averaging and normalizing the raw staircases, 
but this method gives an approximation of the CDF that is not monotoni- 
cally increasing due to the presence of thermal noise and occasional drift in 
the experiment. Instead, the correct way to construct the CDF is to directly 
export the dwell times, sort and rank them from smallest to largest and then 
plot the normalized rank against the dwell time as the empirical CDF. This 
procedure avoids loss of information by binning, given that this empirical 
CDF has a value at each measured dwell time. 

However, in the case of force-clamp trajectories the empirical CDF of all 
the observed dwell times does not coincide with the unfolding probability 
F(t) because of experimental artifacts. The experimental window (fixed 
by the time resolution at short times, t m i n , and the total duration of the 
experiment, i max ) may not encompass the whole range of the unfolding 
probability F(t). The empirical CDF, given by 



where N is the total number of dwell times in the data set and #{dwell times < 
t} denotes the number of such times that are less than t, must therefore be 
fit with a P(t), conditional on the time range of the experiment (fl8l ). While 
F(t) is zero at time zero and reaches one at infinity, the conditional P(t) is 
fixed to zero cit £min in our experiinents, reaches one set £max ctnd is defined 



Pit) 



#{dwell times < t} 



(1) 
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as 

10 if t < t min 

F(t) - F(t min ) 
1 if t > i max 

Note that this conditional fitting of the data fixes the values of F(t) at t m [ n 
and t max without the need of introducing extra parameters. Some previous 
studies introduce a normalization constant as an extra fitting parameter A, 
which may unjustly improve the proposed fit to the theory. The functional 
form of F(t) chosen for the fitting procedure self-consistently determines 
the range captured by the data, as shown in Fig. [2] A. If the experiment 
lasts long enough that the value of F(t) approaches one at i max then the 
conditioning has little effect on the parameters. However, even cases where 
F(t) reaches 0.85 at t max can alter the rate of an exponential function by 
27% and change the shape of the distribution unless this conditioning is 
taken into account (see Fig. SI in the Supplementary Materials). 

Another artifact of force-clamp trajectories is that the molecules detach 
from the cantilever at random times tj, which implies that some events are 
not observed in the experiment. If the total number of domains N in the 
polyprotein chain were known a priori (0), one could unbias the distribution 
of dwell times using order statistics, assuming that the unfolding events are 



independent of one another (jlll ). However, in our experiments the cantilever 
picks up polyproteins at random points on the surface such that any N (up 
to the full length iV max ) can be exposed to a stretching force in a given 
experiment. This renders the unbiasing procedure difficult to resolve as 
different distributions p(N) bias the empirical P(t), which is illustrated on 
synthetic examples in Fig. S2 in the Supplementary Materials. It is therefore 
necessary to filter the data, such that all events come from trajectories that 
correspond to the same time window in the experiment, in order to construct 
the P(t) that corresponds to the underlying F(t). The correct way to do 
so is to choose an experimental time window (e.g. from t m ; n to a cutting 
time t c ) and only consider those dwell times that (i) occur within that range 
and (ii) come from trajectories that lasted over the entire range, such that 
td > tc, as shown in Fig. S3 in the Supplementary Materials. Note that 
filtering the data by the detachment time alone by keeping all dwell times 
less than td leads to empirical CDFs that give inaccurate values of the fitting 
parameters, as shown in Fig. S4 in the Supplementary Materials. 
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Graphical tests of the unfolding probability F(t) 

Using the described methods for filtering and fitting of the experimental 
CDF P(t), we assess the success of different models in explaining the ubiq- 
uitin data. The experimental time window is chosen to be between the time 
resolution of the experiment t m i n = 5ms and the cutting time t c = 5s, which 
ensures three decades over which to test the goodness of fit of the data. 
The same empirical P(t) is then fit with Eq. (J2|) for the four functional 
forms of F(t) proposed in the literature and listed in Table 1. The fitting 
can be done by least squares or maximum likelihood methods, which result 
in parameters that agree to within two decimal places. Since the fitting 
procedure self-consistently fixes F(t max ) and F(t m i n ) for each function, the 
resulting empirical F(t) = (F(t max ) — F(t m i n ))P(t) + F(t m i n ), obtained by 
solving Eq. (j2J) for F(t), differ in their range, as shown in Fig. [2] A. For 
instance, the experimental window captures only 60% of the events in the 
case of the log normal distribution, while it covers almost all the events in 
the case of the exponential function. Moreover, the curves clearly show that 
the exponential fitting is inaccurate, while the other three models are all in 
good agreement with the data on the linear scale and exhibit comparable x 2 
values. In order to zoom into the two decades of fast unfolding times, the 
inset shows the data plotted as the conditional P(t) on a log-log scale that 
emphasizes deviations from the fits. Here it can be seen that the Weibull 
distribution performs better than all others on timescales below 0.1 seconds. 
Note that the Weibull distribution plotted as the F(t) would be a straight 
line on the scales of the inset, but the Pit) distribution is conditional on the 
time window of the experiment and thus exhibits curvature. Even though 
the Weibull distribution fits this data set most accurately, the statistical 
error in the experiment precludes the determination of the correct model by 
this graphical test alone. 

Indeed, many functional forms (particularly those with several param- 
eters) can be successful in fitting a particular time window chosen for the 
analysis, but it is a greater challenge to assess how robust the fitting func- 
tion and its parameters are against filtering the same data over different time 
windows. The shorter the time window, the more data points are needed to 
obtain the same statistical accuracy in the fitting, as shown in the synthetic 
example in Fig. S5 in the Supplementary Materials. Nevertheless, there 
exists a range of cutting times t c over which the fitted parameters should 
converge to the same values given a large enough pool of data. As a test 
of robustness of the parameters, we calculate the first moment of F(t) (i.e. 
the mean unfolding time) fitted at different values of t c , shown in Fig. [2]B. 
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It can be seen that filtering the data at any time above 2.5s has little effect 
on the mean unfolding time for the Weibull distribution, while the Gaussian 
disorder and log normal distributions vary greatly with t c . The mean un- 
folding time is plotted on a logarithmic scale in order to capture the three 
orders of magnitude span that is predicted by the different experimental 
time windows of the same pool of data. This result shows that fitting with 
different physical models leads to dramatic consequences on biological func- 
tion, since the characteristic protein unfolding time varies from 1 second to 
3 minutes. 

While it can be argued that the statistical pool of filtered data shown 
in the inset is insufficient to fit F(t) at short cutting times t c , the lack of 
convergence over any significant filtering range for the Gaussian disorder 
and log normal functions questions their validity in describing the data. On 
the other hand, the exponential distribution does exhibit a range of stability 
after t c ~ 3s, but its poor performance in fitting the data invalidates its use 
for a different reason. This analysis shows that a successful model must not 
only fit the data with fidelity over a range of t c , but also predict parameters 
that are stable over that range. 

Instead of using the least squares method to assess the goodness of fit and 
extrapolate variance in the parameters by bootstrapping, other approaches 
work equally well. One such method is Maximum Likelihood Estimation 
(MLE) (0), which computes the most likely parameters of a distribution 
using a set of variables - in our case the dwell times. The variance in the 
parameters is then obtained by Bayesian sampling of the data set (j2ol ). 
as shown in Fig. S6 in the Supplementary Materials. The mean values 
of parameters a and b in the Weibull distribution and kf and a in the 
Gaussian disorder distribution are shown as a function of the experimental 
time window t c in Fig. [3] A,B, respectively. The inset shows that the root 
mean standard deviation (RMSD) in the fitting parameters decreases as 
a function of t c , which is consistent with the concomitant increase in the 
number of data points and the wider time window of the fit. While the 
fluctuations observed in the Weibull parameters converge to stable values 
above t c 3s, those of the Gaussian disorder model do not settle to any 
given values before t c ~ 7s, which is also reflected in the broad fluctuations 
of the mean unfolding time shown in Fig. [2]B. Note that filtering at long t c 
uses as little as 10% of the data collected, as shown in the inset. Disregarding 
the majority of the data set is never desirable to an experimentalist. 
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Maximum likelihood function includes all collected data 

An alternative MLE function to fitting the CDF of the dwell times as a 
function of t c is one that takes into account experimental features of force- 
clamp trajectories and thus uses the whole data set to estimate parameters in 
the unfolding model. In a typical pulling experiment, the cantilever picks up 
a polyprotein chain of N domains with a probability p(N). These domains 
subsequently unfold at dwell times t\, t2, ■ ■ ■ , where k corresponds to 
the last observed step in the staircase with a minimum k* = 3 required as 
the signature of the single molecule. Finally, the molecule detaches either 
from the tip or the surface at time td- Assuming that the dwell times are 
independent of one another (jii EH) and identically distributed (21-23) we 



calculate the probability of observing k unfolding events, multiplied by the 
probability of N — k domains remaining folded up to the detachment time 
td, for every polyprotein chain: 

N=k y ' 

where f{t) = —dF/dt is the probability density associated with F(t), N* = 
12 is the number of domains in the expressed protein construct and G* 
accounts for the probability of not including staircases with less than k* = 3 
steps 

N * N y\n 

g*=j2 p(n) E m^T)w [F{td)]l[1 " F{td)]N ~ l (4) 

N=k* l=k t ^ '' ' 

Taking the product of the likelihoods for each polyprotein chain in Eq. ([3|) 
gives the overall likelihood function. The parameters in the unfolding prob- 
ability F{t) as well as those defining p(N) (assumed to be a power law with 
a decay coefficient 7 in this case) are obtained by maximizing this likelihood 
function, while the uncertainties are estimated using Bayesian sampling. 

The maximum value of the likelihood function from the ubiquitin data 
set ranks the four proposed unfolding distributions in the following order 
from highest to lowest likelihoods: Weibull, Gaussian disorder, log normal 
and exponential distribution. Given that the actual values of the likelihoods 
depend on the size of the data set and F(t), this rank test only estimates 
which distribution is more consistent with the data, but it cannot assess 
the accuracy of the fits themselves. Nevertheless, the fact that the Weibull 
parameters from the likelihood function, also shown in Fig. [3] A, are in good 
agreement with the parameter convergence of the fits of the F(t) above 
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t c ~ 3s gives further support to this model. Conversely, the lack of such an 
agreement in the runner-up Gaussian disorder model suggests that this is 
not the correct functional form for the unfolding probability. 

A statistical test that quantitatively assesses whether a set of observables 
originates from a given distribution is the Kolmogorov-Smirnov approach 
with a modification by Kuiper (fl6l ). We therefore compare the empirical 
CDFs at different t c from the data set with those generated from the four 
functional forms of f(t) using the parameters estimated by the above likeli- 
hood function. Denoting as before by P(t) the postulated distribution and 
by P(t) the experimental distribution, the Kuiper statistic is defined as, 

U = X' A\ max v (p(tj) - P(tjj) - s/N . min y (p(t 3 ) - P(t 3 -)) (5) 

where the maximum and the minimum are taken over all the iV dwell times 
ti,...,£jv in the data set. U = 1 signifies a perfect match. The results 
in Fig. [J] show that the Weibull distribution is closest to 1 over almost the 
entire range of t c . While the Gaussian disorder model is slightly closer to 1 
between 7.2 < t c < 8s, this narrow range is based on less than 10% of the 
collected data. 

Comparison with synthetic and other data sets 

In order to further test the consistency of the ubiquitin data with the Weibull 
and Gaussian disorder models, we generate two synthetic data sets using 
the parameters obtained from the maximum likelihood function in Eq. (|3"|) 
that mimic the size of the experimental data. We then filter the synthetic 
and experimental data sets by t c and compare the values of their fitting 
parameters using the Weibull distribution in Fig. [5] A,B and the Gaussian 
disorder model in Fig. [5]C,D. In all cases, we find that the experimental and 
the synthetic Weibull data are in good agreement with each other above t c = 
3s, while the synthetic Gaussian disorder data exhibits significant deviations. 
The two data sets are similar in that they not only exhibit comparable 
fluctuations in the fitting parameters of the Weibull arising from statistical 
errors, but they also follow similar trends in their discrepancy from the 
fitting parameters of the Gaussian disorder model. All these results are 
consistent with the hypothesis that the unfolding of ubiquitin data at 110 pN 
is most likely to originate from a Weibull distribution. 

The fact that the Gaussian disorder model does not agree with the data 
contradicts theories of static disorder 

0) and force noise (|24f l. since they 



imply the same fitting function. While the former places the Gaussian noise 
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in the barriers to unfolding, the latter does so in the constant force applied 
by the cantilever. Given that a = apAx, where op is the noise in the 
applied force and A x = 0.23 nm is the distance to the transition state, a 
in the barriers obtained from the MLE function translates to 21% error in 
the force calibration, while the most stable value of the parameter against 
filtering gives 32%, as shown in Table 1. If this functional form had fit the 
data well, the estimated error in the force calibration is much higher than 



the measured error of « 5% (]25l ) and would thus give validity to the scenario 
of static disorder in the ubiquitin free energy landscape rather than that of 
the force noise. 

It is worth noting that there are several reasons for which our results are 
not in agreement with those published in the literature and why they also 
disagree between each other. First, a common mistake in fitting force-clamp 
data is to introduce a normalization constant as an extra fitting parame- 
ter. Instead, care must be taken to fit the conditional unfolding probability 
distribution over the experimental window with P(t) and obtain F(t) us- 
ing Eq. (|2|). Second, binning the distribution of unfolding times should be 
avoided as it effectively introduces an extra parameter into the fitting and 
loses resolution at short unfolding times. Third, filtering the data by ac- 
cepting those trajectories that last a set minimum detachment time td and 
including events that occur after that into the P(t) biases the resulting 
distribution at long times, which in turn skews the fitting parameters. In- 
stead, one must only include those dwell times that occur within exactly the 
same time window. Finally, plotting and fitting data on log-log scales can 
be useful if the data has been shown to fit well with a stretched exponential 
function in order to use a straight line fit. Otherwise, the compression of 
the data may obscure deviations from view and requires further assessment 
of the fits using MLE and Bayesian sampling. 

Conclusion 

Numerous force-clamp analysis methods, such as the fitting of filtered cu- 
mulative dwell time distributions, convergence of fitting parameters with an 
expanding time window of the experiment, the prediction of the maximum 
likelihood function for the whole data set, the Kuiper test, as well as the 
comparison with synthetically generated data sets, ubiquitously demonstrate 
that the data are most likely to arise from an underlying Weibull distribu- 
tion, otherwise known as the stretched exponential distribution. This type 
of kinetics has been observed in the case of DNA relaxation (j26l ) , thermally 
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induced protein folding! 27|, \2w , protein binding (|29i ) and conformational 
dynamics in solution (|30l). Microscopically, the stretched exponential has 
been attributed to multiple pathways in the protein landscape (j3ll ) or mem- 



ory effects (|32j). Our results show that such complexities may also play a 
role in the protein's response to a pulling force at the single molecule level. 

One possible interpretation is that the unfolding events can occur via 
many (random) pathways, each with a different rate a, and the distribution 
of unfolding times is obtained via superposition of the exponential decays in 
each of these pathways. For example, the stretched exponential corresponds 
to rates that are distributed according to the Levy distribution, since its 
probability is defined implicitly via 

3 

(1 - e- at )p{a)da = 1 - = F(t) (6) 



where p(a) cannot be written in closed analytical form but it exhibits a 
power law oc a~ 7 at large a. Therefore, the stretched exponential fitting 
function is in agreement with the theoretical model used in (0) to fit the 
ubiquitin unfolding kinetics. B y co ntrast, the Gaussian distribution of ener- 
gies or force noise proposed in (jld ) corresponds to a log-normal distribution 
in the rates in Eq. ([6]) via the Arrhenius assumption. 

These methods invite previous studies to verify the accuracy of their re- 
sults and provide a statistical toolbox for the analysis of future force-clamp 
studies. Moreover, it is possible to build on these techniques to take into 
account the particularities of a given experiment. For example, it is possible 
to introduce correlations between the domains within the likelihood function 
or assume a known p(N) in the case of pre-pulled proteins. More generally, 
this type of analysis can be applied to other types of force-clamp measure- 



ments, such as the disulfide bond rupture kinetics (]33l ) or the disassociation 



of quaternary interactions between individual domains (1341 ) . 
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Distribution F(t) 


Previous Studies 


1V1LE Parameters 


Exponential 
1 - e~ at 


a ~ 0.67 s" 1 (9) 


a = 0.66 ± 0.02 s^ 1 


Log-Normal 

Ierfc[-M^)] 


a = 3.0 (12.) 
t = 0.005 s 


a = 2.04 ±0.05 
t = 1.26 ± 1.08 s 


Gaussian Disorder (GD) 


k F = 0.73 ±0.03 s" 1 (10) 
a = 3.47 ± 1.16 pNnm 


k F = 0.57 ±0.05 s- 1 
a = 5.32 ± 0.72 pNnm 


Force noise = GD 
with a = cr F Ax 


Ax = 0.23 nm 
cr F = 15.09 ±5.04 pN 


Ax = 0.23 nm 
a F = 23.13 ±3.13 pN 


Weibull 
1 - 


a ~ 0.9 s" 1 (8) 
6 = 7- 1 = 0.8 


a = 0.59 ±0.04 s" 1 
b = 0.73 ± 0.02 



Table 1: Table of parameter values from previous and our study for different 
distributions applied to a data set of ubiquitin pulled at HOpN of constant 
force. 
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Figure Legends 

Figure 1 

(A) A typical force-clamp unfolding trajectory of a single ubiquitin polypro- 
tein pulled with a constant stretching force of 110 pN. The beginning of the 
plateau that precedes the staircase of unfolding events marks time zero to 
as the moment when the molecule is held taught under the applied force. 
The dwell times are then measured as the time interval between to and each 
of the unfolding steps. Finally, the molecule detaches at t^. The stepwise 
unfolding is illustrated in the schematic diagram. (B) Unfolding dwell times 
from the staircases are plotted on a semi-log scale in the order that they are 
collected and show a broad and homogeneous distribution of times. 

Figure 2 

(A) The unfolding probability F(t) for four models proposed in the literature 
is used to fit the same empirical CDF of dwell times. The normalization of 
each F(t) leads to different timescales on which the data unfold. The inset 
shows the corresponding conditional P(t) on a log- log plot to emphasize 
the goodness of fit at short times. (B) Changing the time window from 
5 seconds in (A) to t c shows the variability in the characteristic unfolding 
time between the different models. They span more than three orders of 
magnitude and only the Weibull and the exponential distribution settle to 
a given value. The inset shows how the number of data points changes as 
the time window is expanded. 

Figure 3 

Estimate of the fitting parameters in the Weibull in (A) and the Gaussian 
disorder distribution in (B) as a function of the experimental time window. 
Bayesian sampling shows that the fluctuations around the mean of the pa- 
rameters diminish as the time window increases. The constant solid lines 
are the parameter values obtained from the maximum likelihood function in 
Eq. (|2J) and the dashed lines are their standard deviation. 

Figure 4 

A Kuiper statistic of 1 signifies a perfect match between the experimental 
data and the proposed distribution. Deviations from the line at 1 quantify 
the disagreement between the maximum likelihood function estimate for the 
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four models and the experimental data set as a function of the experimental 
time window t c . 

Figure 5 

Comparison between synthetic data sets generated using the parameters in 
the maximum likelihood function for the Weibull and Gaussian disorder dis- 
tribution and the experimental data set of ubiquitin. The constant solid 
lines are the parameter values obtained from the maximum likelihood func- 
tion in Eq. ([3]) and the dashed lines are their standard deviation. Fitting 
the three data sets using the Weibull distribution gives the fluctuations in 
parameter a in (A) and b in (B) and using the Gaussian disorder distribution 
gives kp in (C) and a in (D). While the ubiquitin data and the synthetic 
Weibull distribution behave similarly above t c = 3s in all cases, the synthetic 
Gaussian distribution is significantly different. 
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Figure 5: 



